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Abstract 

During the 1980s and early 1990s the recombinant DNA revolution provided a vital source of therapeutic targets 
and agents for pharmaceutical research. However, during the early 1990s, it became apparent that the identification 
and cloning of novel human cDNAs was a rate limiting step in drug discovery and that new technological approaches 
were required to address the challenge. There was an increasing realisation that the new science of ‘genomics', 
together with the associated large gene sequence databases, would provide a radically new means of generating 
targets. SmithKline Beecham has been at the forefront of this breakthrough in pharmaceutical research. The 
productivity of this strategy is illustrated by reference to our work on novel enzymes, chemokines and receptors and 
new approaches linking genes to pathological processes. © 2000 Elsevier Science B.V. All rights reserved. 
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1. Database rationale 

The effective exploitation of information in a 
gene or EST (expressed sequence tag) database 
depends on the rationale for cDNA library con¬ 
struction, the biological/structural reasoning driv¬ 
ing the searches and the sophistication of the 
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bioinformatic tools employed. We have invested 
in a major initiative in bioinformatics to organise 
and interrogate this information (Marshall, 1996) 
and have attempted to integrate the EST-based 
strategies into all aspects of drug discovery. In the 
following sections examples are provided to illus¬ 
trate how this information has made a critical 
contribution to our drug discovery programmes. 

At the outset of this project, it was clear that 
creation of an EST database was only the start of 
the process to design new effective and selective 
therapies. The next key issue was identifying 
within the database those genes that were most 
relevant to the drug discovery’ process. 
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2. Osteoporosis 

Tissue-specific expression of genes can provide 
clues to their role in pathology. One method of 
searching for differences in gene expression is to 
interrogate a large collection of ESTs from a 
variety of cDNA libraries. 

2.1. Osteoclast-specific cysteine proteinase 
Ccathepsin K) 

Osteoporosis is an increasingly important dis¬ 
ease in an ageing population. Normally bone is 
continually remodelled, however, when this pro¬ 
cess fails to balance correctly between bone re¬ 
sorption and regeneration, bone loses mass and 
becomes brittle. We were interested in finding 
means to prevent this breakdown of bone. 

Osteoclasts are implicated in bone resorption. 
A cDNA library was constructed from an os¬ 
teoclastoma library and was EST-profiled. Bioin- 
formatic searches revealed an abundant transcript 
that was present as 4% of all ESTs and by homol¬ 
ogy identified it as a novel member of the cysteine 
proteinase class (Drake et al., 1996). This novel 
gene was shown by in situ hybridisation to be 
expressed in osteoclasts but not osteoblasts, confi¬ 
rming database information on specificity of tis¬ 
sue distribution. Subsequent immunohisto- 
chemical studies also confirmed the localisation of 
expression. We are currently evaluating the thera¬ 
peutic potential of this target in bone resorption 
by making specific small molecule inhibitors. 

Whilst we had identified the cathepsin K gene 
by the ‘database-first’ strategy, it is interesting to 
note that the critical role of this gene in bone 
biology has been supported by the work of Gelb 
et al. (1996) in a genetic-based study of pyc- 
nodysostosis. This is a rare autosomal recessive 
disease manifested by osteosclerosis and short 
stature. Gelb et al. were able to locate defects in 
the same gene in three ethnically disparate groups. 

2.2. Atherosclerosis 

Atherosclerosis is a major cause of morbidity 
and mortality. Despite recent therapeutic pro¬ 
gress, e.g. in the area of HMGCoA reductase 


inhibitors, it is clear that further strategies to 
control pathological events, especially in vascular 
wall biology, are required. We were therefore, 
interested in identifying other potential targets. 

2.3. Novel lipoprotein-associated phospholipase 

Oxidation of low density lipoprotein and pro¬ 
duction of lysophosphatidyl choline is a key event 
in the development of atherosclerosis. Lysophos¬ 
phatidyl choline is a powerful chemoattractant for 
monocytes and also induces expression of en¬ 
dothelial leukocyte adhesion molecules: inhibition 
of its formation might therefore, be expected to 
have therapeutic benefit. Hence, we began a 
search for the phospholipase responsible for gen¬ 
erating lysophosphatidyl choline. It was known 
that plasma PLA 2 levels are raised in familial 
hypercholesterolaemia and this provided the start¬ 
ing point for what initially began as a conven¬ 
tional cDNA cloning project. A PLA 2 activity was 
purified from patient plasma, microsequencing of 
this protein enabled a classical oligonucleotide 
cDNA library probing strategy to be initiated. 
However, the project was considerably accelerated 
‘midstream’ as the EST database became available 
(Tew et al., 1996). The enzyme was then overex¬ 
pressed using a baculovirus system and used as 
the basis for a high throughput screen for potent 
inhibitors. 

Though the gene was identified via a microse- 
quencing project, we noted that the chromosomal 
breakpoint in a family with autosomal dominant 
supravalvular aortic stenosis falls within the 5' 
untranslated region of the LDL-PLA 2 gene, lend¬ 
ing further, indirect support for an association 
between this novel protein and cardiovascular 
pathology. 

3. Seven transmembrane G-protein coupled 
receptors 

The seven-transmembrane receptors (7tms) are 
a large and diverse family of proteins and have 
been the target of many successful drugs. Many 
new members of the 7tm family were discovered 
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using low stringency probing and/or PCR 
methodologies. However this approach is limited 
by the signal to noise ratio inherent in hybridisa¬ 
tion or PCR-based strategies, by contrast, homol¬ 
ogy searches in silico can access a wider range of 
sequence divergence. We have identified many 
dozens of new receptors via the EST paradigm. 

Once a potential receptor has been located it is 
necessary to define function. Using Xenopus 
oocyte and mammalian cell expression systems we 
were rapidly able to identify receptors for calci¬ 
tonin gene related peptide (CGRP) and the com¬ 
plement C3a component (an important mediator 
of the inflammatory response). 

More recently we have been able to define a 
new family of neuropeptides, known as the ‘orex- 
ins’ (Sakurai et al., 1998). Orexin A and B are 
derived from the same polypeptide precursor by 
proteolytic processing and recognise two closely 
related receptors of this class. The precursor 
mRNA is localised to neurons within and around 
the lateral and posterior hypothalamus in the 
adult rat brain. In vivo experiments suggest that 
that these neuropeptides play a key role in regula¬ 
tion of feeding behaviour. 

4. Gene expression monitoring 

The examples given above have focused on 
genes that have been identified through probing 
of the database for a match to a novel peptide 
sequence or for homology with known protein 
families or searching for genes which have a selec¬ 
tive tissue distribution. An alternative approach is 
to use this large gene collection directly to inter¬ 
face with mRNA (expression profiling) projects, 
to identify disease genes for use either as thera¬ 
peutic targets or as diagnostic markers. 

High density gene micro-arrays have consider¬ 
able potential for monitoring of gene expression 
patterns in normal and diseased tissues (Ermo¬ 
laeva et al., 1998; Debouck and Goodfellow, 
1999). We are currently evaluating and validating 
high-throughput instrumentation designed to in¬ 
crease the sensitivity of the technology and speed 
the generation of data. Of particular importance 


is the ability to generate a sufficiently large data 
set to be able to distinguish between inter-individ¬ 
ual variation and the variation underlying overt 
pathology. 

5. The phenotype gap 

The EST database centred strategy has pro¬ 
vided many high value targets for drug discovery. 
As the human and mouse genome projects accel¬ 
erate, the prospects for positional or candidate 
gene cloning are rapidly improving now making it 
more feasible to refocus on the search for genes 
underlying pathological processes. Mouse mu¬ 
tants have proved to be very useful in this respect 
as evidenced by the discovery of leptin. However, 
it has been recognised that the number of differ¬ 
ent phenotypes available in the form of mouse 
mutations is quite limited. As result, in collabora¬ 
tion with MRC Harwell, we have set up a major 
programme to create novel mouse strains using 
ENU mutagenesis. To date, over 8000 mice have 
been generated and screened and a wide range of 
novel phenotypes have been recovered. More de¬ 
tails are available at http://www.har.mrc.ac.uk/ 
mutabase/. Once phenotypically validated, they 
will potentially provide a powerful resource for 
positional gene cloning. 

6. Future prospects 

The genomics strategy has already delivered 
many targets that are currently being following up 
within our drug discovery pipeline. It can be 
anticipated that this trend will continue and will 
be complemented with technologies for directly 
interfacing novel cloned genes with biological sys¬ 
tems to determine which enzymes or receptors are 
important in a particular disease process and 
which may be beneficially activated or inhibited 
by pharmacological intervention. Moreover, im¬ 
proved linking between therapeutics and diagnos¬ 
tics will permit both more selective targeting of 
therapy in the clinic and more rapid establishment 
of ‘proof of concept’ in clinical trials. 
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